NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks

The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF—Regulatory Network Reconstruction using EXpression and Collaborative Filtering—a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.


Supplementary Note 1: GPALM Problem Introduction
We extend the original PALM algorithm [2] and propose the GPALM algorithm that can solve more general problems. The format of the problem that GPALM can solve is explained in this section. The actual algorithm and its convergence proof are provided in Supplementary Note 2.

The problem and basic assumptions
We are interested in solving the non-convex and non-smooth minimization problem with the following structure where we have the following assumption: The assumptions for problem (M) is as follow: Likewise, for any fixed X the function Subdifferentials of nonconvex and nonsmooth functions Definition 1. Let σ : R d → (−∞, +∞] be a PLS function. For a given x ∈ dom σ, the Frechet subdifferential of σ at x, written∂σ(x), is the set of all vectors u ∈ R d which satisfy When x ∈ dom σ, we set∂σ(x) = Ø.
The proposition can be proved based on Definition 1.

Proximal map
Let σ : R d → (−∞, +∞] be a PLS function. Given x ∈ R d and t > 0, the proximal map associate to σ id defined by: The proximal map has the following important property (Lemma 3.2 in [2]).

Lemma 1.
Let h : R d → R be a continuously differentiable function with gradient ∇h assumed L h Lipschitz continuous and let σ : R d → (−∞, +∞] be a proper and lower semicontinuous function with inf R d σ > −∞. Fix any t > L h , then for any u ∈ domσ and any u + ∈ R d defined by we have Supplementary Note 2: GPALM Algorithm and its Convergence Analysis The Algorithm Here we first write out the algorithm that is able to solve the problem (M) with a convergence guarantee.

Convergence analysis
The proof procedure has followed the proofs introduced in the original PALM algorithm [2].
(ii) Relative error: there exist some positive constant ρ 2 > 0, such that for any w k ∈ ∂Ψ(B k ), (iii) Continuity: there exist a subsequence B k j j∈N and B * , such that (iv) KL property: Ψ satisfies KL property in its effective domain.
By the theorem above, we only need to check that the sequence generated by Algorithm 6 satisfies the conditions (i) -(iv).
Proposition 2. Algorithm 6 is a global convergence algorithm.
Condition (i). Based on (8), we know which implies We then apply Lemma 1 to (9), Let B k = X k , Y k , Z k and sum over equations from (15) to (17). We have (18) We know that µ k X , µ k Y , and µ k Z have their lower bound and µ k Y i > L Y (X k+1 ). Therefore, we can get tha proves condition (i).

Condition (ii).
Writing down the optimality condition for (8), we have The first inequality comes from the fact that ∇H is Lipschitz continuous on bounded subset R n × R m as assumed in Assumption 1 (6). With the optimality condition for (9), we have (23) The second inequality utilizes the structure of F (Y, X) introduced in Assumption 1 (2). The third inequality uses Proposition 1. We set , and U Y i > U i . Similar to things related to X, writing down the optimality condition for (10), where where Condition (iii). Summing (19) from k = 0 to N − 1 we have Since Ψ(B N ) is decreasing and inf Ψ > −∞, there exist someΨ such that Ψ(B N ) →Ψ as N → +∞. Therefore, let N → +∞ in (27), we have which implies that lim B k − B k−1 = 0. Let B * = (X * , Y * , Z * ) be a limit point of Let j → +∞, we get lim From the fact that F is a PLS function, we also have Based on (32) and (33), we know lim j→+∞ = F (Y * i , Z * ). Arguing similarly with X, we finally have Condition (iv). The function Ψ is a semi-algebraic function, which automatically satisfies the Kurdyka-Lojasiewicz property [2].

Supplementary Note 3: Parameter Selection for Algorithms used in the study
In this section, we introduce how we select parameters for the competing algorithms.

Paramter Selection for PriroSum
PriorSum constructs a predicted GRN by summing overweights from all prior networks P = {P 1 , ..., P d }. Therefore, PriorSum builds a GRN P ij = k P k ij and does not need to select any parameters.

Parameter Selection for LassoStARS
LassoStARS [3] is the latest version of Inferelator, it takes an unweighted prior and gene expression data as input. Because LassoStARS needs an unweighted prior network and the prior networks we have are weighted prior networks, we choose different cutoffs to construct prior networks for LassoStARS. We generate prior networks by assigning each gene the top N TFs based on the P ij . For N , we set N = {10, 20, 30, 40} and we find that N = 10 performs the best and report the results in the main paper. For other parameters used in LassoStARS, LassoStARS proposed a way to select the optimal parameters, therefore, we do not need to select other parameters.

Parameter Selection for MerlinP
For reconstructing the GRN for yeast, MerlinP [4] uses the same prior networks and gene expression to build a GRN and reported in the repository https://github.com/Roy-lab/ merlin-p. We directly download the GRN they build and compared it with other methods. For reconstructing the S2 cell GRN and cell-specific GRNs, we follow the instruction provided in https://github.com/Roy-lab/merlin-p.

Parameter Selection for NetREX
NetREX [5] is similar to LassoStARS, taking an unweighted prior and gene expression as input. So similarly, we generate prior networks for NetREX by assigning each gene the top N TFs based on the P ij . We set N = {10, 20, 30, 40} and we find that N = 20 performs the best and report the results in the main paper. For the other parameters, we selected based on the suggestion provided in https://github.com/ncbi/NetREX.

Parameter Selection for CF
We input CF [6] with P ij = k P k ij . The dimension of the hidden feature vector we set it to be 100, 200, and 300. The regulation term used by CF is set to be 0.1, 1, 10, 100. We try all those combinations and report the result with the best performance.

Parameter Selection for NetREX-CF
Based on the formulaiton of NetREX-CF (??), we know that we need to select h, λ A , λ S , η ij , λ, andC ij . h is the dimension of the hidden feature vector. We find that h = {100, 200, 300} does not change the performance much. For computational consideration, we set h = 100. Because λ A and λ S are used as standard regulation to avoid over-fitting, we set λ A = 1.0 and λ S = 1.0 by default. We introduce the selection of η ij andC ij in the following subsection.

Selection of η ij
We need to make sure F (S, X, Y ) is lower semi-continuous. We can first simplify the equation into (35) F (S, X, Y ) is lower semi-continuous when the parameter before S ij 0 in the above equation is larger than 0. After several manipulations, we find out we need to set η ij as follows to make F (S, X, Y lower semi-continuous. Selection ofC ij C ij is the penalty when we want to use x T i y j to learn B ij = 1. Similarly,C ij is the penalty when we want to use x T i y j to learn S ij 0 = 1. There are two siutations. First, when S ij 0 = 1 and B ij = 1, meaning the sparse NCA-based method confirms the edge in the prior, then intuitively, we need to setC ij = αC ij , α ≥ 1. Another situation is that S ij 0 = 1 and B ij = 0, meaning the sparse NCA-based model confirms an edges recommended by the CF model but not appeared in the prior networks. For this case, we setC ij ∈ [C ij , max(C)], where max(C) is the largest element in penalty matrix C. In sum,C ij = αC ij S ij 0 B ij + β S ij 0 (1 − B ij ), where α ≥ 1 and β ∈ [C ij , max(C)].

Consensus of Different Parameter Selections
As explained in the previous, for η ij andC ij , we know the range of these parameters but do not know the exact optimal values. For reconstructing GRN for the yeast experiment, we set